Contents

0.1 Introduction

Tumorigenesis is a stepwise process driven by a sequence of molecular changes that are described as pathways of cancer progression. Conjunctive Bayesian Networks (CBN) are probabilistic graphical models designed for the analysis and modeling of these pathways [1]. CBN models have evolved into different varieties such as CT-CBN [2], H-CBN [3], B-CBN [4], and R-CBN [5], each addressing different aspects of this task. However, the software corresponding to these methods is not well integrated because they are implemented in different languages with heterogeneous input and output formats. This necessitates a unifying platform that integrates these models and enables the standardization of input and output formats. Evam-tools [6] is an R package that takes the initial steps towards this end. However, it does not include the B-CBN model or the recently developed R-CBN algorithm, which focuses on robust inference of cancer progression pathways [5]. Importantly, the B-CBN and R-CBN algorithms for pathway quantification necessitate exhaustive consideration and weighting of all potential dependency structures (posets) within the mutational quartets. This requires reimplementation of the CBN models and adjustment of downstream pathway analysis and modeling functions. Therefore, here we introduce the CBN2Path R package that not only includes the original implementation of the CBN models (e.g., CT-CBN and H-CBN) in a unifying interface but also accommodates the necessary modifications to support robust CBN algorithms (e.g., B-CBN and R-CBN). Importantly, CBN2Path includes a collection of functions required to quantify predictability [7], analyze robustness [5], and visualize mutational pathways from pre-processed cross-sectional genomic data. It is important to note that the R-CBN method has great potential for wide application in future predictive models because of its unique ability to offer an optimal balance between robustness and predictability [5]. Thus, we anticipate that CBN2Path will be a commonly used package in the field, particularly by providing a platform to facilitate future applications of the R-CBN model.

0.2 Installation

The package can be installed as follows:

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("CBN2Path")
library(CBN2Path)

0.3 Cite our work

If you use the CBN2Path package, please cite the paper formally as follows:

William Choi-Kim and Sayed-Rzgar Hosseini. CBN2Path: an R/Bioconductor package for the analysis of cancer progression pathways using Conjunctive Bayesian Networks. F1000Research (In Submission).

0.4 The CT-CBN model

CBN2Path provides the R interface for the continuous-time CBN model (CT-CBN), which was originally implemented in C programming language [2].

CT-CBN needs the following three inputs:

  1. numMutations: The number of mutations to be considered.

  2. poset: The partially ordered set (poset), which is represented as a two-column matrix each row of which indicate that mutation in the first column must occur before the mutation in the second column.

  3. genotypeMatrix: a binary matrix with n rows and m columns. Each row corresponds to a given genotype in the sample. The first column must be always 1 and each of the other columns corresponds to a given mutation. Thus, m equals numMutations plus one.

Below, you can see an example on how these inputs are prepared to be used in the original implementation of the ctcbn model (ctcbnSingle):

# The poset
dag <- matrix(c(3, 3, 4, 4, 1, 2, 1, 2), 4, 2)

# The genotype matrix
set.seed(100)
gen_1 <- c(rep(0, 150), sample(c(0, 1), 25, replace = TRUE), rep(0, 25))
gen_2 <- c(rep(0, 175), sample(c(0, 1), 25, replace = TRUE))
gen_3 <- c(rep(0, 50), sample(c(0, 1), 100, replace = TRUE), rep(1, 50))
gen_4 <- c(sample(c(0, 1), 100, replace = TRUE), rep(0, 50), rep(1, 50))
g_mat <- matrix(c(gen_1, gen_2, gen_3, gen_4), 200, 4)
g_mat <- cbind(1, g_mat)

# Preparing input of the ct-cbn/h-cbn methods
bc <- Spock$new(
     poset = dag,
     numMutations = 4,
     genotypeMatrix = g_mat
)

# Running the ct-cbn model
results_c <- ctcbnSingle(bc)

Note that in the above example, we have generated a genotype matrix that perfectly matches the constraints specified in the given poset. In other words, in all genotypes in this sample of size 200, mutations 1 and 2 occur only if mutations 3 and 4 are already present.

It is important to note that in the above example, we have used the cbind function to make sure to have the first column always is 1 in all genotypes, and the other four columns respectively represent the mutations 1 to 4.

The maximum likelihood poset that the ct-cbn model outputs can be obtained as:

ml_poset <- results_c[[1]]$poset$sets

The directed acyclic graph representation of the ML poset can be visualized using the visualizeCBNModel function as follows:

visualizeCBNModel(ml_poset)
#> Using "sugiyama" as default layout

Furthermore, the Lambda parameters and the corresponding log likelihood can be obtained as:

ml_lambda <- results_c[[1]]$lambda
loglikelihood <- results_c[[1]]$summary[4]

You can repeat the above analysis using an imperfect genotype matrix (g_mat_mut) for which some of the genotypes violate the pre-specified constraints in the mutational orders. For this purpose, we use the genotypeMatrixMutator function, which subjects the original perfect genotype matrix (g_mat) to false-positive and false-negative error rates of 0.3 and 0.2, respectively:

temp <- g_mat[,2:5]
temp_mut <- genotypeMatrixMutator(temp, 0.3, 0.2)
g_mat_mut <- cbind(1, temp_mut)

and then you can rerun the ct-cbn model using the mutated genotype matrix as follows:

# The poset
dag <- matrix(c(3, 3, 4, 4, 1, 2, 1, 2), 4, 2)
# Preparing the inputs of the ct-cbn method
bc <- Spock$new(
     poset = dag,
     numMutations = 4,
     genotypeMatrix = g_mat_mut
)
# Running the ct-cbn model
results_c_mut <-ctcbnSingle(bc)

You can check whether the ML poset now is different from the original one obtained using the perfect genotype matrix:

ml_poset_mut <- results_c_mut[[1]]$poset$sets
visualizeCBNModel(ml_poset_mut)
#> [1] "This is an empty poset, so no need for visualization."

As you can see in the message above that the ML poset is an empty poset now, meaning that after adding the errors, the model is no longer able to detect any restrictions between the mutations.

Note that if the posets and genotype data are stored in the original format needed in the C implementation of the CBN models, you can preprocess those files using readPoset and readPattern functions in the CBN2Path package. You can see an example below, where the number of mutations is 10:

example_path <- getExamples()[1]
bc <- Spock$new(
     poset = readPoset(example_path)$sets,
     numMutations = readPoset(example_path)$mutations,
     genotypeMatrix = readPattern(example_path)
)
results_c2 <- ctcbnSingle(bc)

Finally, you can obtain and store the results using a list of posets (instead of a single poset) as the input of the model using the ctcbn function. In the example below, we will consider all 219 unique (transitively-closed) DAGs as our list of posets that we use as the input of the ctcbn function:

posets <- readRDS(system.file("extdata", "Posets.rds", package = "CBN2Path"))

bc <- Spock$new(
     poset = posets,
     numMutations = 4,
     genotypeMatrix = g_mat
)
results_c3 <- ctcbn(bc)

You can see that the result is a list of 219 components each including the estimated parameters corresponding to one of the 219 posets in the input list. This strategy is utilized in the R-CBN and B-CBN models for quantifying the pathway probabilities.

You can obtain the log likelihood corresponding to each poset as follows:

loglik <- numeric(219)
for (i in 1:219){
  loglik[i] <- results_c3[[i]]$summary[4]
}

You can verify that the maximum-likelihood poset is the same as the poset that ctcbnSingle outputs as the inferred poset (using the error-free g_mat genotype matrix).

indx <- which.max(loglik)
ml_poset_2 <- posets[[indx]]
identical(ml_poset_2, ml_poset)
#> [1] TRUE

0.5 The H-CBN model

The input/output structure of the H-CBN model (hcbnSingle and hcbn) [3] is exactly the same as in the CT-CBN model (ctcbnSingle and ctcbn) described above.

# The poset
dag <- matrix(c(3, 3, 4, 4, 1, 2, 1, 2), 4, 2)

# Preparing the inputs of the h-cbn method
bc <- Spock$new(
     poset = dag,
     numMutations = 4,
     genotypeMatrix = g_mat
)
# Running the h-cbn model
results_h <- hcbnSingle(bc)

The Lambda parameters and the corresponding log likelihood can be obtained as:

ml_lambda_h <- results_h[[1]]$lambda
loglikelihood_h <- results_h[[1]]$summary[4]

You can also rerun the h-cbn model using the mutated genotype matrix as follows:

# The poset
dag <- matrix(c(3, 3, 4, 4, 1, 2, 1, 2), 4, 2)
# Preparing the inputs of the h-cbn method
bc <- Spock$new(
     poset = dag,
     numMutations = 4,
     genotypeMatrix = g_mat_mut
)
# Running the h-cbn model
results_h_mut <- hcbnSingle(bc)

Again the poset and genotype files stored in the original formats can be processed using readPoset and readPattern functions in the CBN2Path package.

example_path <- getExamples()[1]
bc <- Spock$new(
     poset = readPoset(example_path)$sets,
     numMutations = readPoset(example_path)$mutations,
     genotypeMatrix = readPattern(example_path)
)
results_h2 <- hcbnSingle(bc)

Finally, you can obtain and store the results using a list of posets (instead of a single poset) as the input of the model using the hcbn function.

posets <- readRDS(system.file("extdata", "Posets.rds", package = "CBN2Path"))

bc <- Spock$new(
     poset = posets,
     numMutations = 4,
     genotypeMatrix = g_mat
)
results_h3 <- hcbn(bc)

You can obtain the log likelihood corresponding to each poset as follows:

loglik_h <- numeric(219)
for (i in 1:219){
  loglik_h[i] <- results_h3[[i]]$summary[4]
}

You can verify that the maximum-likelihood poset is the same as the poset that hcbnSingle outputs as the inferred poset (using the error-free g_mat genotype matrix).

ml_poset_h <- results_h_mut[[1]]$poset$sets

indx <- which.max(loglik_h)
ml_poset_h2 <- posets[[indx]]
identical(ml_poset_h2, ml_poset_h)
#> [1] TRUE

0.6 Analysis of cancer progression pathways

One of the important feature of the CBN2Path package is its emphasis on mutational pathway analyses and modeling. In this section, we will work with a set of functions that enable quantification, analysis and visualization of the mutational pathways.

There are two approaches to quantify the pathway probabilities:

  1. The first approach is to use the output of the ct-cbn or h-cbn methods (namely the estimated lambda parameters and the inferred ML poset) as input of the pathProbCBN function. This method is generic as it can be used for every number of mutations.

  2. The second approach works only for mutational quartets (n=4) and uses the genotypic data directly as the input. In this setting, each CBN model has its own pathway quantification functions: pathProbQuartetCTCBN, pathProbQuartetHCBN, pathProbQuartetRCBN, and pathProbQuartetBCBN.

As examples for the first approach, let’s use the results_c2 and results_h2 that we obtained in the previous section by learning respectively, ct-cbn and h-cbn models on genotypic data with n=10 mutations. First, we need to obtain the estimated lambda parameters and the inferred DAG and then use them as the input of the pathProbCBN function as follows:

lambda_c <- as.numeric(results_c2[[1]]$lambda)
lambda_h <- as.numeric(results_h2[[1]]$lambda)
dag_c <- results_c2[[1]]$poset$sets
dag_h <- results_h2[[1]]$poset$sets

prob_c <- pathProbCBN(dag_c, lambda_c, 10)
prob_h <- pathProbCBN(dag_h, lambda_h, 10)

Note that in the above code, probabilities of 10!=3,628,800 pathways need to be calculated, which is extremely time-consuming, so we have not executed this chunk of the code.

Now, let’s try the second approach using both the g_mat and g_mat_mut genotype matrices. Note that in these functions, the number of columns in the input genotype matrix must always be four, and the first column does not need to be an all-one column. Therefore, we must first eliminate the first column from the g_mat and g_mat_mut matrices.

g_mat2 <- g_mat[,2:5]
g_mat2_mut <-g_mat_mut[,2:5]

prob_c1 <- pathProbQuartetCTCBN(g_mat2)
prob_c2 <- pathProbQuartetCTCBN(g_mat2_mut)

prob_h1 <- pathProbQuartetHCBN(g_mat2)
prob_h2 <- pathProbQuartetHCBN(g_mat2_mut)

You can visualize the 24 pathways and their associated probabilities using visualizeProbabilities function. In the figures below, you can compare the pathway probability distributions (quantified using CT-CBN model) before and after errors:

visualizeProbabilities(prob_c1)

visualizeProbabilities(prob_c2)

You can see before adding errors (prob_c1), only four pathways are feasible with non-zero probabilities, but in the presence of error (prob_c2) all pathways are feasible and so the probability is more uniformly distributed among the pathways.

Now, let’s assume that the four genes in consideration are: “KRAS”, “TP53”, “CDKN2A”, “RREB1”. We can now visualize the pathways with gene names and their probability distributions (prob_c2) as follows:

gene_names <- c("KRAS", "TP53", "CDKN2A", "RREB1")
visualizeProbabilities(prob_c2, geneNames=gene_names)

Similarly, in the figures below, we can compare the pathway probability distributions (quantified using H-CBN model) before (prob_h1) and after errors (prob_h2):

visualizeProbabilities(prob_h1)

visualizeProbabilities(prob_h2)

We can see that under the H-CBN model, the pathway probabilities before and after errors stay more similar than what we observed under the CT-CBN model. We can formally quantify to what extent the two probability distribution differ using the Jensen-Shannon Divergence (JSD; implemented as jensenShannonDivergence function).

jsd_c <-jensenShannonDivergence(prob_c1, prob_c2)
jsd_h <-jensenShannonDivergence(prob_h1, prob_h2)
jsd_c
#> [1] 0.4141736
jsd_h
#> [1] 0.313736

We can also quantify the predictability for a given pathway probability distribution as described in [7] using the predictability function. We can see that the predictability after errors drops substantially under CT-CBN:

pred_c1 <-predictability(prob_c1, 4)
pred_c2 <-predictability(prob_c2, 4)
pred_c1
#> [1] 0.5662913
pred_c2
#> [1] 0.06947874
pred_c1 - pred_c2
#> [1] 0.4968126

In contrast, under H-CBN, the predictability after errors, even gets slightly higher than the one obtained in the error-free setting:

pred_h1 <- predictability(prob_h1, 4)
pred_h2 <-predictability(prob_h2, 4)
pred_h1
#> [1] 0.5662912
pred_h2
#> [1] 0.6972734
pred_h1 - pred_h2
#> [1] -0.1309822

Finally, we can compute the pathway compatibility scores both for g_mat2 and g_mat2_mutgenotype matrices using thepathwayCompatibilityQuartet function as follows:

pathway_c1<- pathwayCompatibilityQuartet(g_mat2)
pathway_c2 <- pathwayCompatibilityQuartet(g_mat2_mut)

The Spearman’s correlation coefficient between the pathway compatibility and the pathway probabilities quantified under CT-CBN or H-CBN can be quantified as follows:

rho_c1 <- cor(pathway_c1, prob_c1, method="spearman")
rho_c2 <- cor(pathway_c2, prob_c2, method="spearman")
rho_c1
#> [1] 0.6622063
rho_c2
#> [1] 0.956276
rho_h1 <- cor(pathway_c1, prob_h1, method="spearman")
rho_h2 <- cor(pathway_c2, prob_h2, method="spearman")
rho_h1
#> [1] 0.6622063
rho_h2
#> [1] 0.5464635

0.7 The R-CBN algorithm

The R-CBN algorithm [5] for quantifying pathway probability distributions is implemented in the pathProbQuartetRCBN function, whose input/output structure is similar to what we described above for the CT-CBN and H-CBN. However, there are multiple functions, which are called inside the pathProbQuartetRCBN function, and so the user do not need to directly work with (e.g. pathNormalization, pathwayWeightingRCBN, edgeMarginalized, pathEdgeMapper, and posetWeightingRCBN).

The pathProbQuartetRCBN function also receives a four-column binary genotype matrix as the only input, and outputs the corresponding pathway probability distribution. Let’s quantify the pathway probability distributions before and after adding errors:

g_mat2 <- g_mat[,2:5]
g_mat2_mut <- g_mat_mut[,2:5]

prob_r1 <- pathProbQuartetRCBN(g_mat2)
prob_r2 <- pathProbQuartetRCBN(g_mat2_mut)

In the figures below, you can compare the pathway probability distributions (quantified using R-CBN model) before and after errors:

visualizeProbabilities(prob_r1)

visualizeProbabilities(prob_r2)

Similar to what we described before, the Jensen-Shannon Divergence between the two distributions can be quantified as:

jsd_r <-jensenShannonDivergence(prob_r1, prob_r2)
jsd_r
#> [1] 0.05842678

You can see that the JDS value under R-CBN in this example (0.05), is considerably smaller than those of CT-CBN (0.41) and H-CBN (0.31).

The predictability values can also be compared as follows:

pred_r1 <- predictability(prob_r1, 4)
pred_r2 <- predictability(prob_r2, 4)
pred_r1
#> [1] 0.5397152
pred_r2
#> [1] 0.3825031
pred_r1 - pred_r2
#> [1] 0.1572121

Finally, the Spearman’s correlation coefficient between the pathway compatibility and the pathway probabilities quantified under R-CBN can be quantified as follows:

rho_r1 <- cor(pathway_c1, prob_r1, method="spearman")
rho_r2 <- cor(pathway_c2, prob_r2, method="spearman")
rho_r1
#> [1] 0.957629
rho_r2
#> [1] 0.981945

0.8 The B-CBN method

The workflow for the B-CBN algorithm [4], which is implemented in the pathProbQuartetBCBN function, is also similar to that of R-CBN.

The pathProbQuartetBCBN function also receives a four-column binary genotype matrix as the only input, and outputs the corresponding pathway probability distribution. Again, let’s quantify the pathway probability distributions before and after the errors:

g_mat2 <- g_mat[,2:5]
g_mat2_mut <- g_mat_mut[,2:5]

prob_b1 <- pathProbQuartetBCBN(g_mat2)
prob_b2 <- pathProbQuartetBCBN(g_mat2_mut)

In the figures below, you can compare the pathway probability distributions (quantified using B-CBN model) before and after errors:

visualizeProbabilities(prob_b1)

visualizeProbabilities(prob_b2)

Similar to what we described before, the Jensen-Shannon Divergence between the two distributions can be quantified as:

jsd_b <- jensenShannonDivergence(prob_b1, prob_b2)
jsd_b
#> [1] 0.07740306

The predictability values can also be compared as follows:

pred_b1 <- predictability(prob_b1, 4)
pred_b2 <- predictability(prob_b2, 4)
pred_b1
#> [1] 0.4620815
pred_b2
#> [1] 0.2691159
pred_b1 - pred_b2
#> [1] 0.1929657

Finally, the Spearman’s correlation coefficient between the pathway compatibility and the pathway probabilities quantified under B-CBN can be quantified as follows:

rho_b1 <- cor(pathway_c1, prob_b1, method="spearman")
rho_b2 <- cor(pathway_c2, prob_b2, method="spearman")
rho_b1
#> [1] 0.9088608
rho_b2
#> [1] 0.9641072

0.9 Analysis of fitness landscapes

If we can establish a fitness landscape by directly assigning fitness to all potential genotypes, we can employ evolutionary models to compute the mutational pathway probabilities under the Strong-Selection Weak-Mutation (SSWM) assumption [7], which is implemented in the pathProbSSWM function.

In case of 4 mutations, we will have 16 genotypes, so a fitness vector length of 16 is needed each element of which corresponds to a given genotype that can be determined by the generateMatrixGenotypes function. For example:

fitness_vector <- c(0, 0.1, 0.2, 0.1, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0, 0.6, 0.4, 0.3, 0.2, 1)
g <- generateMatrixGenotypes(4)

The 7-th genotype in the g vector is “1 0 1 0” and its corresponding fitness is fitness_vector[7]=0.3.

The fitness landscape can be visualized as follows:

visualizeFitnessLandscape(fitness_vector)

The pathway probability distribution under the SSWM-based model can be quantified as:

prob_w <- pathProbSSWM(fitness_vector,4)

Moreover, the pathway probabilities can be visualized as:

visualizeProbabilities(prob_w)

and finally the associated predictability can be quantified as:

pred_w <- predictability(prob_w, 4)

0.10 Session Info

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 15.3.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Indiana/Indianapolis
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CBN2Path_0.99.4  BiocStyle_2.34.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] viridis_0.6.5       sass_0.4.10         generics_0.1.3     
#>  [4] tidyr_1.3.1         lattice_0.22-7      digest_0.6.37      
#>  [7] magrittr_2.0.3      evaluate_1.0.3      grid_4.4.1         
#> [10] RColorBrewer_1.1-3  bookdown_0.43       iterators_1.0.14   
#> [13] fastmap_1.2.0       foreach_1.5.2       jsonlite_2.0.0     
#> [16] ggrepel_0.9.6       tinytex_0.57        doMC_1.3.8         
#> [19] gridExtra_2.3       BiocManager_1.30.25 purrr_1.0.4        
#> [22] viridisLite_0.4.2   scales_1.4.0        tweenr_2.0.3       
#> [25] codetools_0.2-20    jquerylib_0.1.4     cli_3.6.5          
#> [28] graphlayouts_1.2.2  rlang_1.1.6         polyclip_1.10-7    
#> [31] tidygraph_1.3.1     cowplot_1.1.3       withr_3.0.2        
#> [34] cachem_1.1.0        yaml_2.3.10         tools_4.4.1        
#> [37] parallel_4.4.1      memoise_2.0.1       coda_0.19-4.1      
#> [40] dplyr_1.1.4         ggplot2_3.5.2       vctrs_0.6.5        
#> [43] R6_2.6.1            magick_2.8.6        lifecycle_1.0.4    
#> [46] MASS_7.3-65         ggraph_2.2.1        pkgconfig_2.0.3    
#> [49] pillar_1.10.2       bslib_0.9.0         gtable_0.3.6       
#> [52] glue_1.8.0          Rcpp_1.0.14         ggforce_0.4.2      
#> [55] xfun_0.52           tibble_3.2.1        tidyselect_1.2.1   
#> [58] rstudioapi_0.17.1   knitr_1.50          dichromat_2.0-0.1  
#> [61] farver_2.1.2        patchwork_1.3.0     htmltools_0.5.8.1  
#> [64] igraph_2.1.4        labeling_0.4.3      rmarkdown_2.29     
#> [67] compiler_4.4.1

0.11 References

[1] Beerenwinkel, et al. Conjunctive Bayesian Networks. Bernoulli, 13(4):893–909, November 2007. ISSN 1350-7265. doi: https://doi.org/10.3150/07-BEJ6133.

[2] Beerenwinkel and Sullivant. Markov models for accumulating mutations. Biometrika, 96 (3):645–661, September 2009. ISSN 0006-3444, 1464-3510. doi: https://doi.org/10.1093/biomet/asp023.

[3] Gerstung, et al. Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics (Oxford, England), 25(21):2809–2815, November 2009. doi: https://doi.org/10.1093/bioinformatics/btp505.

[4] Sakoparnig and Beerenwinkel. Efficient sampling for Bayesian inference of conjunctive Bayesian networks. Bioinformatics, 28(18):2318–2324, September 2012. ISSN 1367-4811, 1367-4803. doi: https://doi.org/10.1093/bioinformatics/bts433.

[5] Hosseini. Robust inference of cancer progression pathways using Conjunctive Bayesian Networks, BioRxiv. July 2025. doi: https://doi.org/10.1101/2025.07.15.663924.

[6] Diaz-Uriarte and Herrera-Nieto. EvAM-Tools: tools for evolutionary accumulation and cancer progression models. Bioinformatics, 38(24): 5457–5459, December 2022. ISSN 1367-4803, 1367-4811. doi: https://doi.org/10.1093/bioinformatics/btac710.

[7] Hosseini, et al. Estimating the predictability of cancer evolution. Bioinformatics, 35 (14):i389–i397, July 2019. ISSN 1367-4803, 1367-4811. doi: https://doi.org/10.1093/bioinformatics/btz332.